A Logical Formalisation of the Fellegi-Holt Method of Data Cleaning

نویسندگان

  • Agnes Boskovitz
  • Rajeev Goré
  • Markus Hegland
چکیده

The Fellegi-Holt method automatically “corrects” data that fail some predefined requirements. Computer implementations of the method were used in many national statistics agencies but are less used now because they are slow. We recast the method in propositional logic, and show that many of its results are well-known results in propositional logic. In particular we show that the Fellegi-Holt method of “edit generation” is essentially the same as a technique for automating logical deduction called resolution. Since modern implementations of resolution are capable of handling large problems efficiently, they might lead to more efficient implementations of the Fellegi-Holt method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Cleaning Methods

Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods for automatically estimating optimal parameters without training data that we extend to many real world sit...

متن کامل

Implied Edit Generation and Error Localization for Ratio and Balancing Edits

The U.S. Census Bureau has developed SPEER software that applies the Fellegi-Holt editing method to economic establishment surveys under ratio edit and a limited form of balancing. It is known that more than 99% of economic data only require these basic forms of edits. If implicit edits are available, then Fellegi-Holt methods have the advantage that they determine the minimal number of fields ...

متن کامل

Extending the Fellegi-Holt Model of Statistical Data Editing

This paper provides extensions to the theory and the computational aspects of the Fellegi-Holt Model of Editing (JASA 1976). If implicit edits can be generated prior to editing, then error localization (finding the minimum number of fields to impute) can be quite rapid. In some situations, not all of the implicit edits can be generated because of the great number (> 10^30) of distinct edit patt...

متن کامل

Error Localization and Implied Edit Generation for Ratio and Balancing Edits

The U.S. Census Bureau has developed SPEER software that applies the Fellegi-Holt editing method to economic establishment surveys under ratio edit and a limited form of balancing. It is known that more than 99% of economic data only require these basic forms of edits. If implicit edits are available, then Fellegi-Holt methods have the advantage that they determine the minimal number of fields ...

متن کامل

A Comparison Study of ACS If-Then-Else, NIM, and DISCRETE Edit and Imputation Systems Using ACS Data

In any statistical surveys, the information gathered may contain inconsistent, incorrect, or missing data. These erroneous data need to be revised or lled in prior to data tabulations and retrieval. The revisions of the erroneous data should not a ect the statistical inferences of the data. The missing data, as well as some inconsistent or incorrect data, are easy to identify while others are n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003